Analysis of the Genetic Relationship between Atherosclerosis and Non-Alcoholic Fatty Liver Disease through Biological Interaction Networks

Non-alcoholic fatty liver disease (NAFLD) seems to have some molecular links with atherosclerosis (ATH); however, the molecular pathways which connect both pathologies remain unexplored to date. The identification of common factors is of great interest to explore some therapeutic strategies to improve the outcomes for those affected patients. Differentially expressed genes (DEGs) for NAFLD and ATH were extracted from the GSE89632 and GSE100927 datasets, and common up- and downregulated DEGs were identified. Subsequently, a protein–protein interaction (PPI) network based on the common DEGs was performed. Functional modules were identified, and the hub genes were extracted. Then, a Gene Ontology (GO) and pathway analysis of common DEGs was performed. DEGs analysis in NAFLD and ATH showed 21 genes that were regulated similarly in both pathologies. The common DEGs with high centrality scores were ADAMTS1 and CEBPA which appeared to be down- and up-regulated in both disorders, respectively. For the analysis of functional modules, two modules were identified. The first one was oriented to post-translational protein modification, where ADAMTS1 and ADAMTS4 were identified, and the second one mainly related to the immune response, where CSF3 was identified. These factors could be key proteins with an important role in the NAFLD/ATH axis.


Introduction
Non-alcoholic fatty liver disease (NAFLD) is a disease characterized by the excessive accumulation of lipids (steatosis) in the liver unrelated to any viral infection or excessive alcohol consumption [1][2][3]. NAFLD currently encompasses several liver conditions of varying severity, ranging from simple steatosis or steatohepatitis to the development of cirrhosis or hepatocellular carcinoma [4][5][6], and it is one of the most common liver diseases in the West, with a prevalence of 20-30%, rising to 70% in people with obesity or diabetes [1,4]. During the last decade, NAFLD has gained importance as a hepatic manifestation of the metabolic syndrome, for which it was renamed in 2020 as Metabolic twenty-four healthy controls (male = 8, female = 16; mean age = 38.67) were used for the analysis. Patients and healthy controls were recruited from the liver clinic or the Multiorgan Transplant Program, respectively, at the University Health Network, Toronto, Canada. The study was approved by the local Research Ethics Board and followed the guidelines of the 1975 Declaration of Helsinki and its revisions. All participants provided informed written consent. The patient sample was obtained by liver biopsy due to suspicion of NAFLD. The exclusion criteria were: alcohol consumption >20 g/day; any other liver disease; use of medications that could cause steatohepatitis, ursodeoxycholic acid, or any experimental drugs, antioxidants, or PUFA supplements in the 6 months prior to admission; pregnancy or breastfeeding. The samples from the healthy patients were from healthy organs (without steatosis or cirrhosis) that were being evaluated for living donor liver transplantation. The main exclusion criterion was any reason that excluded them from liver donation.
The GSE100927 series was submitted by Steenman M et al. [42], and it was developed on GPL17077 platform Agilent-039494 SurePrint G3 Human GE v2 8 × 60 K Microarray 039,381 (Probe Name version). A total of 69 atherosclerotic tissue samples (male = 57, female = 12; mean age = 70.3) and 35 healthy tissue samples (male = 28, female = 7; mean age = 47.9) were used for the analysis. The atherosclerotic samples were obtained from patients undergoing carotid, femoral, and infrapopliteal endarterectomy; all diseased arteries presented with advanced atherosclerotic plaques. The healthy arteries without atherosclerotic lesions were obtained from organ donation. Written informed consent was obtained from both the patients and next-of-kin donors. Sample collection and handling was carried out according to the guidelines of the Medical and Ethical Committee of Nantes, France. Patients with non-atherosclerotic peripheral arterial disease, thrombosis, or restenosis were excluded.
The analysis of the DEGs (|log2 FC (fold change)| > 1 and adj. p-value < 0.05) for each gene expression profile resulted in a total of 270 up-regulated and 318 down-regulated genes for NAFLD ( Figure 1A), while for ATH, the result was 421 up-regulated genes and 154 down-regulated genes ( Figure 1B). The overlapping DEGs that matched in their regulation for both pathologies resulted in 21 genes (14 down-regulated ones ( Figure 1C) and 7 up-regulated ones ( Figure 1D).

Protein-Protein Interaction (PPI) Network
Twenty-one DEGs related to NAFLD and ATH were used as a query in the STRING application within the Cytoscape software, with the aim of generate a PPI network. The

Protein-Protein Interaction (PPI) Network
Twenty-one DEGs related to NAFLD and ATH were used as a query in the STRING application within the Cytoscape software, with the aim of generate a PPI network. The confidence value of the interacting proteins was set to 0.7, and the maximum additional interactors were 50. The result was a PPI network of 71 nodes with 572 interactions, as shown in Figure 2A. show volcano plot with DEGs identified in NAFLD and ATH, respectively (|log2 FC (fold change)|> 1 and adj. p-value < 0.05). Panels (C,D) show a list with common downand up-regulated DEGs identified in both pathologies, respectively.

Protein-Protein Interaction (PPI) Network
Twenty-one DEGs related to NAFLD and ATH were used as a query in the STRING application within the Cytoscape software, with the aim of generate a PPI network. The confidence value of the interacting proteins was set to 0.7, and the maximum additional interactors were 50. The result was a PPI network of 71 nodes with 572 interactions, as shown in Figure 2A. Seventy-one genes that constituted the PPI network were then used for enrichment analysis, which was performed using the DAVID database. A cutoff point was established for at least 20 genes in order to obtain an overview of the PPI network. The main pathways were involved in o-glycosylation, post-translational protein modification, and cytokine signaling processes. Gene Ontology related them to the extracellular matrix and metallopeptidase activity (Table 1).

Functional Modules and Hubs
To analyze the topology of the constructed PPI, betweenness centrality and degree centrality were calculated. Genes with higher centrality scores were then identified through the CytoNCA. The centrality analysis resulted in the hubs shown in Table 2. The PPI network modules were identified using the MCODE plug-in and those with a score of above four were selected. The result was the selection of three modules: module 1 contained 28 nodes and 377 edges; module 2 contained 16 nodes and 112 edges, and module 3 contained 4 nodes and 6 edges, which had MCODE scores of 27,926, 14,933, and 4000, respectively ( Figure 2B).
In order to further analyze the enrichment of core genes, we conducted a Gene Ontology and REACTOME pathway analysis for the two modules selected in the previous step. The results showed that 28 genes in module 1 were mainly related to O-glycosylation processes and related diseases, which probably affect the metabolism and organization in the extracellular matrix. In this context, metalloendopeptidase activity and the organization of collagen fibers could be involved. Integrin signaling mechanisms and extracellular matrix degradation could be also important.
On the other hand, the genes forming module 2 were mainly associated with Interleukin signaling, which in an immune process that regulated the inflammatory response and cell migration through cytokine signaling. These cytokines could be regulated by mechanisms such as chemotaxis or the regulation of gene expression. In addition, genes belonging to module 2 appear to be linked to protein kinase signaling and growth factors. The negative regulation of cell proliferation also appears to be significant in this module.
Finally, four genes included in module 3 appear to be linked to calcium and ATP binding processes for muscle contraction purposes, but due to the low number of genes included in the module, most of the terms that appeared in the enrichment did not pass the adjusted p-value cutoff. Figure 3 shows the main enrichment results for the selected functional modules.

Checking Key Genes through Public Gene-Disease Association Databases
According to the results obtained in the previous sections (centrality analysis and functional module analysis), public gene-disease association databases were used to check the genes of interest. The DEGs common to the two pathologies with certain importance were ADAMTS1 (ADAM metallopeptidase with thrombospondin type 1 Motif 1) and CEBPA (CCAAT Enhancer Binding Protein Alpha), as genes with high centrality scores, and as for the functional module analysis, they were ADAMTS1 and ADAMTS4 for module 1 and CSF3 (Colony-Stimulating Factor 3) for module 2.
As shown in Table 3, the genes selected for screening in the gene-disease association databases showed a strong association with ATH and NAFLD pathologies (or very similar pathologies). The two metallopeptidases ADAMTS1 and ADAMTS4 were related to ATH in practically all of the databases we consulted, while their relationship with the liver was shared between oncologic processes and fatty liver. On the other hand, the genes CEBPA and CSF3 appeared to be closely related to NAFLD and cardiovascular

Checking Key Genes through Public Gene-Disease Association Databases
According to the results obtained in the previous sections (centrality analysis and functional module analysis), public gene-disease association databases were used to check the genes of interest. The DEGs common to the two pathologies with certain importance were ADAMTS1 (ADAM metallopeptidase with thrombospondin type 1 Motif 1) and CEBPA (CCAAT Enhancer Binding Protein Alpha), as genes with high centrality scores, and as for the functional module analysis, they were ADAMTS1 and ADAMTS4 for module 1 and CSF3 (Colony-Stimulating Factor 3) for module 2.
As shown in Table 3, the genes selected for screening in the gene-disease association databases showed a strong association with ATH and NAFLD pathologies (or very similar pathologies). The two metallopeptidases ADAMTS1 and ADAMTS4 were related to ATH in practically all of the databases we consulted, while their relationship with the liver was shared between oncologic processes and fatty liver. On the other hand, the genes CEBPA and CSF3 appeared to be closely related to NAFLD and cardiovascular processes, including ATH. Table 3. Diseases related to genes common to ATH and NAFLD with high scores in centrality analysis or included in the functional modules of interest.

Discussion
Currently, the relationship between diseases of the cardiovascular system and liver disease is poorly explored. Despite this, studies connecting these types of diseases are gradually appearing. The studies linking ATH and NAFLD previously performed by other authors motivated us to perform this study exclusively using bioinformatics tools. The previous studies have suggested a connection between the two diseases through oxidative stress processes, inflammatory processes, coagulation factors, and hepatokine involvement, which would indicate an overlap in the molecular mechanisms shared by ATH and NAFLD [17,18].
The results of this study show that by using the genetic data submitted by Arendt BM et al. [41] and by Steenman M et al. [42] for NAFLD and ATH processes, respectively, there is a total of 21 common DEGs shared in both pathologies, of which 14 genes were down-regulated and 7 genes were up-regulated. Among them, four genes were selected as especially relevant to the NAFLD/ATH axis due their high score in the centrality analyses or due their involvement in the functional modules identified in this study. Thus, ADAMTS1, ADAMTS4, CEBPA, and CSF3 were proposed as common genes between both pathologies.
Regarding ADAMTS1, it is a metalloproteinase belonging to the ADAMS family, which is involved in extracellular matrix (ECM) remodeling, a process regulated by a variety of modifiers, including enhancers and inhibitors [43]. This protein plays an important role in degrading ECM components and inhibiting angiogenesis [44][45][46][47] via its metalloproteasedependent catalytic and thrombospondin-dependent regions [47][48][49]. In this context, it participates in the degradation of pro-collagen, proteoglycans, and the cartilage oligomeric matrix protein [50,51]. Moreover, the knockdown of ADAMTS1 seems to promote cell migration [52], which could be related to ATH development [53].
On the other hand, the down-regulation of ADMTS1 has been associated with Metabolic Dysfunction-Associated Fatty Liver Disease (MAFLD) [54]. In relation to this, it has recently been shown that the inhibition of ADAMTS1 in adipose tissue leads to adipose tissue expansion, together with decreased insulin sensitivity and dysfunctional lipid metabolism [43,54], and thus, appears to contribute to the maintenance of lipid homeostasis [54]. These findings are consistent with the decreased expression of this protein in the adipose tissue of obese mice and with an inverse correlation of ADAMTS1 expression with body mass index in humans [43].
In exploring the link between NAFLD and ATH, ADAMTS1 may be involved in both pathologies through the metalloproteinase MMP1 signaling pathway. TFPI-2, a factor associated with ECM remodeling and ATH [55,56], has been identified as a binding partner for ADAMTS1 by Torres-Collado, A. et al. [43]. TFPI-2 participates in the inhibition of matrix metalloproteinases MMP-1 [56], which is involved in the degradation of type I and III fibrillar collagens and the matrix proteins, which are the main components of the endothelial and subendothelial walls [57,58]. The degradation of these proteins allows the migration and subsequent expansion of leukocytes and VSMCs [59], leading to the development of atheroma plaques and the thickening of the intima-media layer [60,61]. Furthermore, MMP1 was found to be expressed in monocytes, Kupffer cells, and liver stellate cells early in the development of non-alcoholic steatohepatitis (NASH) [62,63], as well as in the hepatocyte progenitor cells participating in the process of angiogenesis in advanced NASH [63]. Therefore, the down-regulation of ADAMTS1 observed in our study could be related to ATH and NAFLD, leading to a decrease in TFPI-2 activity and overexpression of MMP1. In addition, ADAMTS1, which was also identified as an inflammatory associated protein, is required for a balanced immune response [64,65]. It is known that both innate and adaptive immune systems are involved in NAFLD pathogenesis, and crosstalk between the immune cells and liver cells participates in its initiation and progression [66]. In the case of ATH, the involvement of the immune system in its development is well known [67]. In accordance with these studies, the decreased levels of ADAMTS1 observed in our analysis from the NAFLD and ATH samples could be related to an abnormal immune response, contributing to the development of both disorders. Accordingly, the PPI network enrichment analysis performed in this study already showed a link to the cytokine-related processes, immune system processes and protein modifications. These results suggest that the role of ADAMTS1 in the NAFLD/ATH axis could be explained by its participation in different signaling pathways.
However, the exact role of ADAMTS1 it is not completely understood, since there are some studies that demonstrate opposite results, showing a link between ADAMTS1 and the development of atherosclerotic plaques and ATH [68]. Moreover, ADAMTS1 has been found to be overexpressed in the intima of atherosclerotic plaques [45,69,70], as well as in the neutrophils and macrophages accumulated in the aortic tissues of patients with acute aortic dissection [71]. Regarding liver diseases, ADAMTS1 has been associated with the ability to activate TGF-b in liver fibrosis [72,73], as well as with NASH [74]. Due to the controversy shown regarding the role of ADAMTS1 in the development of ATH and NAFLD, further studies are needed to elucidate the real role of this protein.
Regarding ADAM metallopeptidase with thrombospondin type 1 motif 4 (ADAMTS4), it is an important analog of ADAMTS1. Studies have shown that several inflammationassociated signals also reduce ADAMTS4 expression, leading to a subsequent increased accumulation of ECM components, which could contribute to the fibrotic deposition of collagen [75]. In this way, ADAMTS4 also appears to be downregulated in our functional module results, which could be related to the development of NAFLD and ATH.
Regarding CEBPA or CEBPα, this protein is one of the factors that regulate the process of adipogenesis, together with PPARγ, and is involved in the sequential expression of adipocyte-specific proteins [76][77][78][79][80][81][82]. It is also essential for the myeloid lineage maturation process [83]. CEBPA appears to be expressed in inflammatory processes [83], although its exact function is unknown. Zhou J et al. observed, in 2019, that its overexpression increases the neutrophil population in a murine model [84]. Neutrophils are well known to respond to acute inflammation, but they are also linked to chronic inflammation [85,86]. Recent studies have related neutrophils to the formation of neutrophil extracellular traps (NETs) through a process called NETosis [86][87][88], which can promote the inflammatory process by stimulating the synthesis of ROS and proinflammatory cytokines by macrophages [89]. These NETs have been found in atherogenic plaques in both murine and human models [90][91][92], and the inhibition of NETosis has been linked to a decrease in the size of atherogenic plaques and an increase in plaque instability. Therefore, an overexpression of CEBPA may contribute to the development of ATH, thus promoting the increase in the neutrophil populations and the formation of those NETs.
On the other hand, Bristol J.A et al. observed that CEBPA, together with CEBPB, binds to the TNFR1 promoter, increasing its expression and inducing an increase in TNF expression through a positive feedback mechanism [93]. TNFα is a known factor to promote the development of ATH and NAFLD [29,[94][95][96], so CEBPA may contribute to these diseases through the TNFα pathway.
In accordance with this, our results show CEBPA overexpression in both the NAFLD and ATH samples. This protein appears in the module 2, which is closely related to interleukin signaling, immune response, and cytokine activity. However, further studies are needed to better understand the specific role of this protein, since some studies suggest its relationship with anti-inflammatory process in murine models [84,97], which are in contrast to the studies mentioned above. Currently, there is a very little amount of information available about this protein, so bioinformatics studies such as the one we have carried out are very valuable as they are able to identify interesting potential biomarkers that are unexplored to date. In this sense, it seems promising to intensively study the mechanism of action of CEBPA to explore its potential role as a biomarker of NAFLD and ATH or as a possible therapeutic target.
Another gene of interest identified in our study is the CSF3 gene, which encodes a member of the IL-6 superfamily of cytokines. The encoded cytokine controls the production, differentiation, and function of granulocytes. The importance of this gene in NAFLD is described by Nam et al., in which they demonstrate that a treatment with CSF3 in animal models had a possible protective effect by reducing hepatocyte apoptosis and by increasing cell survival and the anti-inflammatory function [98]. Regarding ATH, it has been shown that CSF3 therapy inhibits the atherosclerotic process in animal models [99]. In our study, the CSF3 gene does not appear with a high score in the centrality analysis, but it does appear as a member of the second functional module identified. According to the results of DEGs analysis, CSF3 appears to be down-regulated in our samples for both ATH and NAFLD, which agrees with the results of the published studies, being that this under-expression of CSF3 the possible cause of the development of NAFLD and ATH.
Our study has some limitations, such as the reduced number of samples used to compare, which is mainly due to the low number of public data series with an adequate level of quality or information and to the lack of control datasets that can be used to compare the pathological samples with healthy samples. Another important limitation would be the exclusion of potential targets involved in the NAFLD/ATH axis in the bioinformatics analysis due to establishment of a specific cutoff point. However, the enrichment analysis performed negates this limitation, avoiding the loss of potential candidates. Additionally, expression studies at the proteomic level would be useful to validate the obtained results since only genetic data and bioinformatic studies have been analyzed. The strength of our study lies in the generation of a strategy that is capable of combining and jointly exploiting the information available through different bioinformatic tools, generating very valuable information for the identification of new potential targets related to these highly prevalent pathologies. More experimental studies are necessary to understand the specific role of the identified proteins in NAFLD and ATH, however, the first step is to identify some potential good candidates to explore as biomarkers or therapeutic targets of these disorders.
In summary, our findings suggest that atherosclerotic processes could share common molecular pathways with the development of some liver disorders such as NAFLD. Our study identified some novel potential targets in the NAFLD/ATH axis, including ADAMTS1, ADAMTS, CEBPA, and CSF3, using mainly bioinformatics tools. Considering that cardiovascular disease (including ATH) is the leading cause of death worldwide and has a major socioeconomic and health care impact and NAFLD has an increasingly higher incidence and is the most prevalent liver disease, affecting 70% of diabetic or obese patients, research into the common molecules involved in the development of these highly prevalent pathologies can have a great impact on clinical practice. The potential role of these molecules as early biomarkers of NAFLD and ATH could contribute to the development of preventive tools, with the aim of avoiding the appearance of irreversible complications in affected patients. On the other hand, as seen in the aforementioned studies, the modulation of these molecules could be used as therapeutic strategies, slowing down or improving the symptomatology of these diseases. Although future experimental studies are needed to confirm the dual function of these proteins in both pathologies, this study provides valuable information for the study of the utility of these proteins as potential biomarkers or therapeutic targets, which could improve the quality of life of affected patients.

Data Acquisition and Visualization and Identification of Differentially Expressed Genes
The NCBI-GEO is a public functional genomics database repository. By searching for keywords, such as ATH NAFLD in Homo sapiens, two series were selected that could be used to compare the differential genes of both pathologies.
To obtain the differentially expressed genes between the healthy and pathological samples of both pathologies, we used GEO2R. GEO2R is an interactive web-based tool that allows the users to compare datasets in the GEO series to determine the DEGs. |log2 FC (fold change)| > 1 and adj. p-value < 0.05 were considered to be statistically significant. DEGs with a log2 FC ≤ −1 or a log2 FC ≥ 1 were considered to be downregulated and up-regulated, respectively. The Benjamini-Hochberg False Discovery Rate (FDR) was used for p-value correction.

Protein-Protein Interaction (PPI) Network
To understand the interactions among the common down-regulated and up-regulated DEGs, the Cytoscape software [100] was used to analyze and visualize the biological network of interaction. Cytoscape provides an open source environment for the largescale integration of molecular interaction network data. In addition, Cytoscape enables integration with stringApp [101] to facilitate the visualization of network data from the STRING database [102]. To provide more robustness to the analysis, the 50 nearest interactors to the identified SDRs were included. The confidence score (cutoff) was set to 0.7 (high confidence).
Functional annotation was performed with the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [103].

Functional Modules and Hubs
Subsequently, CytoNCA [104], a Cytoscape plugin was used to perform a centrality analysis and identify essential proteins within the biological network. CytoNCA calculated the node scores by applying two centrality methods: degree centrality (defined as the number of links incident upon a node) and betweenness centrality (defined as the amount of influence a node has on the flow of information in a network).
In addition, the main functional modules were analyzed using Cytoscape's Molecular Complex Detection (MCODE) plug-in [105]. MCODE was used to perform the graphtheoretic clustering to detect dense regions of protein-protein interaction networks based on connectivity data, most of which correspond to known protein complexes. The parameters set for screening the function were as follows: degree cutoff = 2, max depth = 100, k-score = 2, and node score cutoff = 0.2. Only modules with an MCODE score of at least 4 were selected. A new functional enrichment analysis using DAVID was performed on those modules with the best scores.

Checking Key Genes through Public Gene-Disease Association Databases
The DEGs resulting from the above analyses, i.e., those that scored highly in the centrality tests and were part of identified functional modules, were validated by textmining using databases such as DisGeNET, MalaCards, and HuGE Genopedia.
The DisGeNET database was used to obtain the genes associated with ATH and NAFLD. DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated with human diseases [106]. The latest update available is version 7 (June 2020) containing 1,134,942 gene-disease associations (GDAs) between 21,671 genes and 30,170 diseases and traits. The data contained in this database come from the most popular repositories used by the scientific community. In addition, these data are expanded and enriched with information extracted from scientific literature using state-of-the-art text-mining tools.
MalaCards is an integrated database of human pathologies and their annotations. This database is organized into disease cards containing information, annotations, connections between other diseases, as well as genes associated with each disease. It currently contains 22,091 disease entries, which come from 75 sources [107].
HuGE Genopedia is a database that focuses on genetic association studies summarized in Human Genome Epidemiology (HuGE). Following its latest available data update, it contained 16,498 genes and 3416 diseases. Using a single gene as a query, it provides summary information on diseases that have been studied in association with the given query [108].

Conclusions
Scientific evidence suggests that the development of atherosclerotic processes may share molecular mechanisms with the development of NAFLD. Supporting this evidence, our results indicated two main targets that were highlighted as hubs in the bioinformatics analyses that we carried out: ADAMST1 and CEBPA. However, additional targets could be considered, although with a lower score than those of the two mentioned proteins. This molecular relationship between both pathologies opens the door to the design of therapeutic strategies that can contribute to the improvement of the quality of life of affected patients or even to the development of preventive strategies for use by the population at a higher risk of suffering from these complications.

Conflicts of Interest:
The authors declare no conflict of interest.